Loh, W.-Y.and Shih, Y.-S. (1997). Split selection methods for classification trees. Vol.7, No.4.

Statistica Sinica 7(1997), 815-840

SPLIT SELECTION METHODS FOR CLASSIFICATION TREES

Wei-Yin Loh and Yu-Shan Shih

University of Wisconsin-Madison and National Chung Cheng University

Abstract: Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.

Key words and phrases: Decision trees, discriminant analysis, machine learning.